%netscout - technical paper
%author - Aditya Joshi
\documentclass[twocolumn]{article}
\usepackage{graphicx}
%set dimensions of columns, gap between columns, and paragraph indent 
\setlength{\textheight}{8.75in}
\setlength{\columnsep}{2.0pc}
\setlength{\textwidth}{6.8in}
%\setlength{\footheight}{0.0in}
\setlength{\topmargin}{0.0in}
\setlength{\headheight}{0.0in}
\setlength{\headsep}{0.0in}
\setlength{\oddsidemargin}{-.19in}
\setlength{\parindent}{1pc}
\makeatletter
\def\@normalsize{\@setsize\normalsize{10pt}\xpt\@xpt
\abovedisplayskip 10pt plus2pt minus5pt\belowdisplayskip 
\abovedisplayskip \abovedisplayshortskip \z@ 
plus3pt\belowdisplayshortskip 6pt plus3pt 
minus3pt\let\@listi\@listI}
%need an 11 pt font size for subsection and abstract headings 
\def\subsize{\@setsize\subsize{12pt}\xipt\@xipt}
%make section titles bold and 12 point, 2 blank lines before, 1 after 
\def\section{\@startsection {section}{1}{\z@}{1.0ex plus 1ex minus .2ex}{.2ex plus .2ex}{\large\bf}}
%make subsection titles bold and 11 point, 1 blank line  before, 1 after 
\def\subsection{\@startsection {subsection}{2}{\z@}{.2ex plus 1ex} {.2ex plus .2ex}{\subsize\bf}}
\makeatother


\begin{document}
%don't want date printed
\date{}
%make title bold and 14 pt font (Latex default is non-bold, 16pt) 
\title{\Large\bf netscout - Adaptive Network Monitoring and Threat Countering Mechanism}
\author{\begin{tabular}[t]{c@{\extracolsep{2em}}c} 
Aditya Joshi & G Sivakumar\\
Computer Science and Engineering & Computer Science and Engineering \\
Indian Institute of Technology Bombay, India  & Indian Institute of Technology Bombay, India\\
aditya.joshi.2000@gmail.com & siva@cse.iitb.ac.in \end{tabular}}
\maketitle

\subsection*{\centering Abstract}
%IEEE allows italicized abstract
{\em
Networks have policies in place to in order to ensure efficient and fair resource 
utilization and protection of network installations. An effective monitoring and
countering mechanism needs a complete picture of resource utilization and network
activity in order to act on time. Such information is typically found in heterogeneous
logs files.

netscout is a python based network filtering, monitoring and countering mechanism driven by
policies. Network traffic is observed in real-time and largely in a passive mode. Decisions
on future patterns of activity monitoring are done based on past and current user/host 
network activity information. The above mentioned 'predictive' model allows efficient 
information gathering and subsequent appropriate responses based on the information.
%end italics mode
}
\section{Introduction}

\begin{figure}
\includegraphics[width=60mm]{topo.png}
\caption{A typical university network topology}
\label{fig:nwTopo}
\end{figure}

Consider a typical university network as shown in, Figure~\ref{fig:nwTopo}. The depicted network
has interconnected residential (resnet), academic and hostel networks. Connection to the Internet is via
one or many router(s) and proxy server(s). Also, Proxy authentication via LDAP could be used in order to
access the Internet service. Though netscout can be run on any kind on network topology, the just described
topology is used as an example throughout the paper and is very similar to the type in which netscout has been
tested on.

The router(s) (nodes running LINUX) have the ability to act as vantage points inorder to monitor and control network traffic.
netscout \cite{netscout} can be run on the routers to build user/host network activity profile. 
The system can then either choose to act on the information in real time or transfer the obtained
information to a secondary part of the system for further analysis and countering.

\section{The netscout Advantage}
The strength of netscout lies is the ability to employ some 'intelligence' in packet gathering. As LANs
tend to generate huge amounts of traffic the network data sample should be small but useful. An approach where 
a sample of network data is gathered, sans any filters, the probability of finding useful information is
proportional to the data size. A large data size though tends to slow down further processing. For example,
consider (a) 1000 packets captured and P2P (peer-to-peer) packets looked for in them or (b) looking only for a threshold 
level of P2P packets. As evident, case (b) is more efficient as the data set captured contains relevant information 
and also the threshold levels decided by the administrator allows to control the number of P2P hosts  
intended to be discovered.  

netscout also tries to obtain a logical picture of the network activity. For example, once netscout has ascertained
that a host is involved in P2P activity it then tries to (if activated) obtain the proxy username being used
from that host. This is done by looking for HTTP request packets. The proxy username component now
allows netscout to build network profile of the user. 

Hosts added to netscout's database via static keyword matching techniques (say P2P), slowly form the
database for the future. For instance, hosts identified as involved in P2P activities are first
checked for their HTTP traffic. Websites accessed by these hosts are stored and classified. The classification
involves looking at keywords in the META section of the HTTP response obtained from the website. 
If website is found to be violating policy then, a further detailed analysis of the host network 
activity is done in order to ascertain if the host accesses too many policy violating websites. 
Furthermore, hosts accessing websites classified as policy violating are in future monitored for their network activity. 
In addition to keyword matching techniques few statistical techniques too are employed for traffic monitoring.
Parameters such as bandwidth usage and high incoming connections help identify potential network bottlenecks and
possible policy violating traffic.
In essence, netscout tries to maximize network information by efficiently gathering series of small network data sets.

 
\section{netscout Internals}
netscout is a python based network filtering, monitoring and threat countering mechanism. The 
packet gathering is largely done via existing tools such as ngrep\cite{ngrep},tcpdump\cite{tcpdump}
and tshark\cite{tshark}. The gathered data is then analysed and acted upon based on the network policies,
laid out in the netscout configuration file, which drive the application. 
Currently, netscout can monitor and detect:  
(1) unauthorized  P2P (DC++), (2) policy violating HTTP traffic, (3) MAC address spoofing, 
(4) heavy bandwidth usage and (5) 'hostile' traffic against important network resources. 
The information is persisted in a SQLite \cite{sqlite} database for basing future activity monitoring and 
reporting user/host network activity.
netscout has five modes of operation: {\bf passive}, {\bf protect}, {\bf counter}, {\bf update} and {\bf report} modes.
These modes can be activated/deactivated via the following "config" file parameters
\fbox{\begin{tabular}{p{7cm}}
\#\# Set to true to check for DC++ p2p traffic\\
dcppMode=false\\

\#\# Set to true to check for http traffic\\
httpMode=true\\

\#\# Set to true to do statistical analysis\\
statMode=false\\

\#\# Set to true to protect important resources (proxy,mail) \\
protectMode=false\\

\#\# Set to true to counter offending hosts\\
counterMode=false\\
\end{tabular}}


\section{Passive Mode}
The primary data gathering mode of netscout. Here traffic is listened on to a specified interface and data
obtained based on the configuration file. Also, statistical parameters like maximum number of connections to, bandwidth
usage are computed based on the thresholds decided in the configuration. Sample session of netscout invoked in
passive mode on the eth0 interface. 

\fbox{\begin{tabular}{p{7cm}}
\# netscout -i eth0\\ 
Screen logging on\\
nescout version 1.0\\
Ensure you are root!\\
Reading Configuration Parameters\\
Detailed Screen logging on\\
Using existing db file\\
netscout passive mode\\
Checking for tcpdump\\
tcpdump check ok\\
Checking for tshark\\
tshark check ok\\
Checking for ngrep\\
ngrep check ok\\
Checking if root permissions available\\
ROOT permissions available\\
Creating data directories\\
getting existing host info\\
Found existing 0 hosts\\
getting existing user info\\
Found existing 0 users\\
\end{tabular}}

\section*{P2P (DC++)}
Peer-to-peer shares\cite{dcpp} tend to generate lot of network traffic and create potential 
network bottlenecks. In addition to the large network volumes generated, it is the potential copyright infringement 
that P2P hubs can be used for that causes a need to monitor them. Also, not all P2P shares generate lot of
traffic or indulge in copyright infringement, those can be identified by analysing the bandwidth usages
, files hosted.  

netscout can identify hubs and clients operating in  a network without requiring prior knowledge of
such hosts and the need to connect to central peer hubs to obtain information. Working in a passive mode
and armed with keyword matching and protocol information, netscout can identify peers and also the files
shared and traded by peers. Based on the results netscout can form a database for future runs and attempt
to identify P2P hubs with more efficiency.  
The number of HUBS to be looked for can be controlled by setting the following appropriate configuration parameters:

\fbox{\begin{tabular}{p{7cm}}
\#\# Number of MyINFO packets to look for (ie number of HUBS to locate)\\
myinfo=18\\
\#\# Number of Search Results packets to look for (to get file list)\\
sr=26\\
\#\# Number of HUB-Client Packets to look for\\
hubClient=15
\end{tabular}}

Sample netscout P2P session:

\fbox{\begin{tabular}{p{7cm}}
Initialing DC++ P2P scouting \\
Looking for DC++ HUBS\\
Listening for MyINFO packets\\
Obtaining DC++ HUBS IP and PORT\\
Checking if User ID is already known\\
Obtaining User id for 10.2.1.1\\
Scanning HTTP requests for  10.2.1.1\\
adding host 10.2.1.1 and id pqrs123\\
Host not been seen in this session\\
User not been seen in this session\\
Getting File List for HUB 10.2.1.1\\
Searching for requested file extensions\\
Scanning for P2P clients\\
Scanning for P2P clients for HUB 10.2.1.1\\
clientList ['10.168.1.2']\\
Adding client information\\
Checking if User ID is already known\\
Obtaining User id for 10.168.1.2\\
Scanning HTTP requests for  10.168.1.2\\
adding host 10.168.1.2 and id abcd123\\
Host not been seen in this session\\
User not been seen in this session\\
Adding hub information
\end{tabular}}

The file types to be searched for can be set in the ext configuration file parameter:

\fbox{\begin{tabular}{p{7cm}}
ext=.mpg,.mpeg,.avi,.mov,.rm,.wmv, .mp3,.wma,.jpeg,.jpg,.png,.gif\\
\end{tabular}}


\section*{HTTP}
The type and kind of websites allowed access are driven by policies. Hence, monitoring the kind of
websites being accessed by a user/host gives insight into the future potential activity of the user/host.
Proxies generally hold black or white lists, determined by policies, to control Internet access. 
netscout, instead of such a static technique, uses keyword matching techniques to classify a website. The
classification is done by looking for keywords only the {\bf head} section containing the
meta and title information, instead of scanning a complete page. 
It has been found that few lines of text of the head section are sufficient in most cases to classify the website.
The categorization is based on the category-keyword pairs as decided by the policy. A user/host accessing
websites that violate policy is assumed to do so in the future too and subsequent runs for netscout 
keep that in mind when filtering traffic. This automated classification also allows now to identify unlisted
websites and keeping the website access lists updated. 

\section*{MAC Spoofing}
MAC spoofing often forms the backbone of malicious attacks and detecting it can go a long way
in thwarting potential future attacks. netscout, in each run, captures the MAC address of the active hosts. The MAC
thus obtained is compared when the host is active again in subsequent runs of netscout.  
Deviations from past history are reported.

\section*{OS Fingerprinting}
netscout uses the lightweight passive OS fingerprinting expertise of {\bf p0f} \cite{p0f}. Once potential policy 
violating hosts have been identified, netscout tries to garner more information about them. p0f allows
an efficient way to learn the OS make of the host allowing better response to threats.   
 
\section*{Bandwidth Usage}
Bandwidth usage monitoring goes in a long way to ensure fair and efficient usage of
network resources. netscout explicitly looks at P2P traffic utilizing protocol information,
but can also detect traffic bottlenecks. In this case, bandwidth usage of hosts is computed
and those high on the list are probed more to obtain information about the type of traffic
flowing through them. Another indicator is to look for number of destination connections 
terminating at a host. Hosts exhibiting high values of incoming connections are also probed
for their traffic types and patterns. This way services like LAN multiplayer games, streaming servers 
may be identified without specifically looking for them.

\fbox{\begin{tabular}{p{7cm}}
\#\# number of packets to look for connections terminating on a host in order to raise alarm\\ 
connTo = 50\\
\#\# minimum number of connections terminating on a host in order to raise alarm \\
connToMin = 20\\
\#\# ignore activity to and from this host. eg, a proxy server \\
ignoreHost = proxy.local.ac\\
\end{tabular}}


\section{Protect Mode}
A major goal of information filtering is to use it to protect important resources.
Mail, Proxy, academic servers, to name a few, need to be protected from internal threats. netscout
monitors the activity to specified important resources and alerts system administrators
of deviations from the norm. In addition activity from identified 'rouge' hosts is controlled 
(bandwidth throttling) via firewall rules (IPTABLES).

\fbox{\begin{tabular}{p{7cm}}
\#\# Network resources that need to be protected\\
impRes = proxy.local.ac, smtp.local.ac\\
\#\# number of packets to look for connections terminating on important hosts \\
prConnTo = 50\\
\#\# minimum number of connections terminating on a host in order to raise alarm \\
prConnToMin = 20\\
\end{tabular}}

\section{Counter Mode}
Once a threat or bandwidth hoarder is identified steps can be taken to limit the damage. 
netscout is able to either control bandwidth output from hosts or completely block them from 
network access for a pre-defined time duration. Also, if the proxy authentication from the
host is determined then e-mails can be sent to warn users about their actions. 

\section{Update Mode}

In the update mode netscout looks to improve on previous and add new web categorizations based on the 
category and keywords pertaining to it, as specified by the configuration file. 
The website categorization exercise is completely automated and can be carried out, if required,
independently during low network traffic hours. 
netscout can either update unclassified websites or all websites (classified
or unclassified) based on the category-keyword pairs as defined. The results of the website categorization 
may be reviewed, and if required, corrected by authorized personnel.  
\fbox{\begin{tabular}{p{7cm}}
\#\# Enter name value pairs of categories and keywords \\
Adult = sex,porn\\
Video = video,stream\\
SocialNetworking = friends,network,meet\\
News = news\\
\end{tabular}}

\section{Report Mode}
After a netscout run has finished the information collected can be retrieved via the report mode.
The report mode prints out a user/host wise display of network activity in a specified time period.
\fbox{\begin{tabular}{p{7cm}}
\#netscout -r\\
netscout report mode\\
------------------------------------------------------\\
------------------------------------------------------\\
HOST:10.2.2.1\\
	 Activity: 0\\
	 User ID : pqrs123\\
	 Time    : Sun Oct 26 09:28:37 2008\\
		 P2P Act : 0\\
		 P2P Type: SERVER\\
		 PORT    : 411\\
		 FILES: \\
abcd.mp3\\
23.avi\\
7.mp3\\
1x1.gif\\
------------------------------------------------------\\
------------------------------------------------------\\

HOST:10.168.1.2\\
	 Activity: 0\\
	 User ID : abcd123\\
	 Time    : Sun Oct 26 09:29:34 2008\\
		 P2P Act : 0\\
		 P2P Type: CLIENT\\
		 SERVER  : 10.2.2.47\\
\end{tabular}}

\section{Configuration/Policy file}
netscout behaviour is controlled by a configuration file where the kind of network 
activity to be monitored and the kind of countering mechanisms to be employed stated.

\section{Future Work}
Future work includes:
\begin{itemize}
\item Identifying 'tor' endpoints
\item More extensive host profiling to identify services running on a system`
\end{itemize}

\begin{thebibliography}{9}
\bibitem{netscout}
{\em http://netscout.googlecode.com/}, 2008.
\bibitem{nmdc}
{\em http://www.teamfair.info/wiki/}, 2008.
\bibitem{dcpp}
{\em https://launchpad.net/linuxdcpp}, 2008.
\bibitem{tcpdump}
{\em http://www.tcpdump.org}, 2008.
\bibitem{ngrep}
{\em http://ngrep.sourceforge.net}, 2008.
\bibitem{tshark}
{\em http://www.wireshark.org}, 2008.
\bibitem{p0f}
{\em http://lcamtuf.coredump.cx/p0f.shtml}, 2008.
\bibitem{sqlite}
{\em www.sqlite.org/},2008

\end{thebibliography}
\end{document}


